Quiz
Assignments
Question. What issues did you find in these lines of code?
1: def Alice_sign(message):
2: secret_key = randint(0, 2**64) # sample uniformly at random
3: public_key = PublicKey.from_sk(secret_key) # compute Alice's public key
4: random_k = int(time.time()) # sample one-time randomness
5: signature = sign(secret_key, message, random_k) # run ECDSA.sign
6: return public_key, message, signature # send to Bob over the Internet
7: def Bob_verify(public_key, message, signature): # received from Alice over the Internet
8: return verify(public_key, message, signature) # run ECDSA.verify, return the result
By the end of today's lecture, you will learn:
Bitcoin was invented in 2009 by an anonymous party (or parties) holding the pseudonym Satoshi Nakamato.
At the heart of Bitcoin is a blockchain, which is an append-only ledger that is stored by anyone on the Internet that uses the Bitcoin software.
Source: Wikipedia
The most important property of the Bitcoin blockchain is that all participants eventually reach agreement over the state of the ledger. If I believe that Alice owns 3 coins, then (eventually) so should you.
In order to promote long-term agreement and stability, Bitcoin purposely sacrifices short-term speed and efficiency.
There are three types of participants within Bitcoin.
A light client only needs to hold one or more secret keys for a digital signature scheme (plus a small amount of data that we will discuss today).
A node additionally stores the entire state of the blockchain.
A miner additionally tries to write new blocks on the blockchain. For their service, miners receive 6.25 new Bitcoins plus the fees inside of every transaction within their block.
Cryptocurrencies like Bitcoin have been carefully designed to ensure that censorship, inconsistency, and double spending attacks are difficult to perform and not cost-effective.
Bitcoin achieves consistency through its rule that miners and nodes should always choose the largest blockchain (or more precisely, the one with the most cumulative difficulty), with the tiebreaker being to choose the chain you heard first.
This rule incentivizes miners to post their newly minted blocks as soon as possible, and combined with the randomized election system ensures that inconsistent forks in the blockchain do not last very long.
[Image source: Mango Research]
nonce such that the hash of the block begins with a large number of 0s. The puzzle friendliness property of SHA-256 says that each miner has no better strategy than to guess the nonce at random. Eventually someone will find and post a winning nonce in a block.[Image source: blockchain.info]
Remember that a blockchain contains individual transactions, which are placed into blocks, which are themselves linked together into a blockchain.
Nodes and miners in the system maintain a local copy of the blockchain, and they should check the validity of each new block when it is created and gossiped around the network.
Overall, here is the high-level idea of Bitcoin's consensus algorithm (adapted from Abhishek Jain's slides):
In more detail, for an individual transaction to be valid, it must:
Verify function) from everyone who is sending money.This is why Bitcoin is said to follow an unspent transaction (UTXO) model.
If I show a single transaction to a light client, then they can check properties 1 and 2 on their own, but they cannot check property 3!
The third property requires knowledge of the previous transactions in the blockchain, so only nodes and miners can verify this.
For a block to be valid, it must:
nonce that satisfies the difficulty rule (i.e., where the hash of the entire block begins with some large number of 0s).(There are some other constraints in the Bitcoin software beyond the ones we've described here, but these are the most important ones that we will focus on.)
Once again: if I hand an individual block to a light client, they can check property 2 on their own, but not property 3.
Only a full nodes and miners can check the validity of all transactions and blocks, since they have stored a local copy of the entire blockchain.
We'll come back to the light clients later today. First though, let's summarize what we have learned about the security guarantees of a blockchain.
Bitcoin relies on the following assumption:
The majority of the computational power used for mining (at any given time) is controlled by honest miners.
Here, the term "honest" means someone who runs the unmodified Bitcoin software, plays by the rules, and doesn't try to double-spend or otherwise cheat the system.
If this assumption holds, then a leader election based on proof of work (eventually) stops double-spending attacks. That's because a dishonest miner cannot keep up with the raw computing power of the honest majority.
On the other hand: if an attacker actually has the majority of the hash power, then they can solve proof of work puzzles faster than the honest miners and can therefore rewrite history!
Note that this assumption is different than the "anytrust" assumption, which was all that we needed to ensure censorship resistance.
To see why Bitcoin needs an honest majority of computing power, let's use the same forking picture as before, but change the scenario as follows:
D has been posted.B.[Image source: Mango Research]
To do this, the miner will need to post many proofs of work in a row, very quickly. It needs to create $3 + N$ blocks faster than the real world produces $N$ blocks.
This is a Markov chain problem!
Nakamoto's whitepaper includes the following function (which I translated from C++ into Python). It calculates the probability that an attacker who controls a q fraction of the computing power of the network can execute a double-spend attack to overwrite the last z blocks in the blockchain.
# Code adapted from the Bitcoin whitepaper at https://bitcoin.org/bitcoin.pdf
from math import exp, pow
def AttackerSuccessProbability(q: float, z: int):
p = 1.0 - q
poisson_lambda = z * (q / p)
sum = 1.0
for k in range(z+1):
poisson = exp(-poisson_lambda)
for i in range(1, k+1):
poisson = poisson * poisson_lambda / i
sum = sum - poisson * (1 - pow(q / p, z - k))
return sum
q = 0.1
print("q =", q)
for z in range(10):
print("z =", z, "\t Prob =", format(AttackerSuccessProbability(q, z),"1.7f"))
print("")
q = 0.3
print("q =", q)
for z in range(0, 55, 5):
print("z =", z, "\t Prob =", format(AttackerSuccessProbability(q, z),"1.7f"))
q = 0.1 z = 0 Prob = 1.0000000 z = 1 Prob = 0.2045873 z = 2 Prob = 0.0509779 z = 3 Prob = 0.0131722 z = 4 Prob = 0.0034552 z = 5 Prob = 0.0009137 z = 6 Prob = 0.0002428 z = 7 Prob = 0.0000647 z = 8 Prob = 0.0000173 z = 9 Prob = 0.0000046 q = 0.3 z = 0 Prob = 1.0000000 z = 5 Prob = 0.1773523 z = 10 Prob = 0.0416605 z = 15 Prob = 0.0101008 z = 20 Prob = 0.0024804 z = 25 Prob = 0.0006132 z = 30 Prob = 0.0001522 z = 35 Prob = 0.0000379 z = 40 Prob = 0.0000095 z = 45 Prob = 0.0000024 z = 50 Prob = 0.0000006
This code shows that even if a dishonest miner controls 10% of all the hash computing power on the blockchain, it will only have about a 1.3% chance of succeeding at double-spending a transaction from z = 3 blocks ago.
The attacker can increase its chance of success by buying or renting more computational power.
In response, defenders can resist this attack by waiting for more blocks to be confirmed in the public, honest chain.
Suppose there is a transaction that pays you money, but you don't trust the sender and you want to be confident that the money is "safe" and cannot be taken away from you through a double spend attack. You want to be 99.9% confident that your money is safe (i.e., that a double-spend succeeds with at most 0.1% probability).
The Nakamoto whitepaper includes the following table that shows how many blocks you need to see mined after your block, as a function of the dishonest miner's computational power.
| Adversary's computational power | # blocks required |
|---|---|
| 10% | 5 |
| 15% | 8 |
| 20% | 11 |
| 25% | 15 |
| 30% | 24 |
| 35% | 41 |
| 40% | 89 |
| 45% | 340 |
There are two important takeaways from this table:
The hash power required to execute a double-spend attack quickly becomes onerous enough to be not worth it, for all but the most expensive transactions. Even if the attacker controls even 10% of the Bitcoin network's computing power, executing a double spend attack will likely take more than one hour. By contrast, if the miner uses this hash power honestly to create new blocks, it could have earned (in expectation) tens of thousands of dollars in mining fees.
On the other hand, if the attacker controls the majority of the hash power, then your money is never safe; even transactions made years ago could eventually be double-spent. This is called a 51% attack.
Suppose you are Bob, and you see that Alice is sending a transaction on the Bitcoin peer-to-peer network to pay you $1.
B, you still shouldn't believe that the money is yours because a fork could happen and this block might not wind up on the longest chain.[Image source: Mango Research]
B containing your transaction, you still might not believe that the money is yours because Alice might be trying to run a double-spend attack on the forked chain.[Image source: Mango Research]
Definition. A node considers a block to be finalized if it is at least $k$ blocks deep inside of the longest chain. We'll declare a transaction to be finalized if it is part of a finalized block. (The choice of $k$ here affects the probability of an erroneous decision.)
Overall, Bitcoin's proof-of-work blockchain achieves the following security goals as long as a majority of the computing power is honest (i.e., follows the rules).
As a simple heuristic, many people consider Bitcoin transactions to be finalized once they are 6 blocks deep in the blockchain. At that point, an attacker with 10% of the mining power has only probability $p = 0.0002428$ of rewriting history.
Note that we have not yet provided any security guarantees to the light clients. This is intentional, since light clients do not have the ability to verify transactions and blocks
In order to achieve these goals, Bitcoin is purposely designed to be:
In Part 3 of this course, we will explore alternative ways to satisfy the liveness and safety goals with different tradeoffs.
Let's revisit those light clients, and see how we can extend liveness and safety to them.
Remember that a light client doesn't have the hard drive space to store the 500+ gigabytes of data in the Bitcoin blockchain. Still though, they can periodically connect to the Internet and download some data.
It would be great if we could compress all transactions into a small dataset. Unfortunately, that's impossible.
Instead, we can leverage the facts that:
The clients just need to verify the accuracy of the transactions -- that is, to make sure that the nodes are telling them the truth. To do that, we can use a Merkle tree.
Remember what Merkle trees do:
In more detail: we construct a Merkle tree as follows.
Source: Alin Tomescu, Decentralized Thoughts blog
If you store one 32-byte hash digest corresponding to the set of files $f_1, f_2, \ldots, f_k$, then someone else can prove to you that one file (say, $f_3$) is contained within the set $S$ represented by the Merkle tree.
Source: Alin Tomescu, Decentralized Thoughts blog
How does this relate to Bitcoin? Well, if you think about the transactions in blocks as "files," we can do something similar here.
To support light clients, we are going to make two changes to the way that Bitcoin works. The first change applies to everyone, and the second change applies only to light clients.
First, each block will contain the Merkle root of all transactions within the block. The block itself will also contain all transactions, as before.
We will call the block header as just the following items: the nonce of the current block, the root of a Merkle tree for all transactions within that block, and the pointer to the previous block's hash.
Because the Merkle root binds together all of the transactions, it now suffices to create a proof-of-work puzzle where the hash is only taken over the header, and not the transactions themselves.
[Image source: Nakamoto whitepaper]
A light client will follow the same rule as nodes: find the blockchain of the largest height (or more accurately, the most cumulative difficulty). But the light client only downloads the block headers, not the full blocks (i.e., they won't know the transactions).
This is substantially smaller:
With this info, a light client can:
Note that a light client cannot verify:
They must rely on the nodes to perform verification check #1, and on their own memory for property #2.
This makes light clients more vulnerable to attackers than nodes. For instance, a malicious miner can just create a single "bad" block that contains invalid signatures; nodes will reject it but a light client will be fooled.
For this reason, light clients should be even more careful when finalizing a transaction.
Eventually though, light clients will inherit the liveness and safety guarantees from the nodes.